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RATE OF RELAXATION FOR A MEAN-FIELD 
ZERO-RANGE PROCESS^ 

By Benjamin T. Graham 

University of British Columbia 

We study the zero-range process on the complete graph. It is a 
Markov chain model for a microcanonical ensemble. We prove that 
the process converges to a fluid limit. The fluid limit rapidly relaxes 
to the appropriate Gibbs distribution. 

1. Introduction. Suppose there are a number of boxes, N, each contain- 
ing R indistinguishable balls. At rate A^, do the following. Pick a source box 
and a sink box — do this uniformly at random over the N"^ ordered pairs of 
boxes. If the source box is not empty, take a ball from there and place it in 
the sink box. This is a Markov chain on the set 

fijv = {b G : 6i + • • • + 6iv = NR}. 

The number of balls in the iih box is bi. We will write Bi = Bi{t) for the 
corresponding random variable. 

We will call this the mean-field zero-range process (MFZRP). The zero- 
range process is normally defined on a directed graph, with balls jumping 
along the edges. In contrast, our process is implicitly defined on a complete 
graph — balls can move from any box to any other box. The transition rate 
between neighboring elements of is A^~^. The Markov chain is reversible 
with respect to the uniform distribution on i^Ar. In the language of statistical 
physics, the equilibrium process can be said to have Bose~Einstein statistics 
[6, 9]. 

The number of ways of putting b indistinguishable balls into N distin- 
guishable boxes is (''^^Y^); the number of configurations |;BAr| = (^^^"^)- 
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Fig. 1. A sample path: the empirical distribution with R — 20, A'' — 10^ . 
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Let Nk be the number of boxes containing k balls, and let Xk = Nk/N. 
We will call X = {Xo,Xi, . . .) the empirical distribution of the process. In 
Figure 1, we show a series of snapshots of the empirical distribution as it 
evolves with time. The process converges to a fluid limit as ^ oo. The fluid 
limit methodology has been applied to a large number of models [2, 7, 15]. 
For an introduction to fluid limits, and a general fluid limit theorem in 
finite dimensions, see [2]. We will use an extension of this theorem to infinite 
dimensions [7], Chapter 2. 

Theorem 1.1. Up to any finite time T, the empirical distribution con- 
verges exponentially, in sup norm \\ ■ \\, to a fluid limit x. For 5 > 0, 

limsupiV^^logPf sup ||X(t)-x(t)|| >5) <0. 

N^oo \0<t<T ) 

We will study the MFZRP by looking at the fluid limit. We saw above that 
the equilibrium distribution of B\ converges to a geometric distribution as 
N ^ oo. We therefore expect the fluid limit to do the same as i — > co . 

Theorem 1.2. Let be the geometric distribution with mean R. The 
fluid limit converges to G^ exponentially in Kullback-Leibler divergence, and 

limsup t-^logDKL{^{t)\\G^) = -n{R-^). 

t~*oo 

The motivation for the MFZRP comes from statistical physics. In Section 3, 
we introduce the micro canonical ensemble formalism [5, 12]. The funda- 
mental assumption of statistical physics is that equivalent states have equal 
probability. Ensembles were introduced to derive the Boltzmann and Gibbs 
distributions from the assumption of equiprobability. 

The MFZRP has also been studied in connection with the ZRP on the dis- 
crete torus, jLTL^ [16]. By constructing "flows" between different graphs, 
the rates of relaxation of the corresponding ZRP can be compared. The 
spectral gap of the MFZRP is r2(i?~'^) uniformly in N . 

We can look at the boxes as distinguishable particles, and the balls as 
indistinguishable quanta of energy. At equilibrium, the probability that 
(A'^o, A^i, . . .) = (no,ni,...) is proportional to the number of configurations 
b G Bjq compatible with (no,ni, . . .). From this point of view, the MFZRP 
is a microcanonical ensemble. The corresponding Gibbs distribution is the 
geometric distribution G^ . 

In Section 4, we describe a simple Markov chain microcanonical ensem- 
ble, the Ehrenfest model [4, 5]. It was proposed as a probabilistic model 
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for entropy. The Ehrenfests wanted to reconcile the deterministic nature of 
Boltzmann's H-theoiy with the apparently random microscopic disorder of 
matter. 

In Section 5, we derive the MFZRP fluid limit. The differential equation 
that defines the fluid limit has a unique stationary point, G^. Imagine that 
Xq, the fraction of empty boxes in the finite MFZRP, is "fixed." Then any 
given box behaves like a biased random walk on N. It loses balls at rate 1, 
and gains balls at rate 1 — Xq. This type of biased random walk is central 
to our analysis. In Section 8, we use properties of the biased random walk 
to prove bounds on the fluid limit. 

An important feature of the biased random walk is that it can be trun- 
cated to a finite Markov chain without changing the "typical" behavior. 
In Section 6, we discuss some general convergence techniques for reversible 
Markov chains [3, 17]. Coercive inequalities force finite Markov chains to 
converge to equilibrium exponentially fast in both distance and Kullback- 
Leibler divergence. We consider variants of these inequalities in Section 7. 
These will be used in Section 9 to show that the fluid limit converges expo- 
nentially. 

In Section 10, we prove the convergence of the empirical distribution to 
the fluid limit. The proof provides a rigorous justification for the fluid limit 
differential equation derived in Section 5. 

Our results are stated for the MFZRP started with R balls in each box, 
and the corresponding fluid limit as N ^ oo. We can extend the results to 
allow different initial distributions of the NR balls. Suppose in the limit as 
N oo, ||X(0) — x(0)|| — > 0, where x(0) is a probability distribution with 
mean R. The proof of Theorem 1.1 is easily adapted. However, the rate 
of convergence of the fluid limit to will depend on the tail of x(0); 
Dkl(x||G^) will decay exponentially only if there is a geometric bound on 
the tail of x(0). 

A similar dependence on the initial configuration seems to arise when 
considering the total variation mixing time for the finite MFZRP. The total 
variation mixing time ri(l/4) = 0{NR^ logi?) [16]. However, we may expect 
that the process mixes much more quickly when started with X(0) = 6fi.- We 
discuss this further in Section 11. 



2. Notation. Let il. he a countable set. For convenience, we will use the 
terms "measure" and "mass function" interchangeably. Let 6i be the Dirac 
measure with its atom at i: 

1, k = e, 

0, otherwise. 



6e{k) 



We will say that QiOxfi^Misa transition rate matrix (Q-matrix) if for 
all j / k, Q{j,k) > and for all k, Q{k,k) = — X^jyfc Q(^) j)- Assume now 
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that sup;, \Q{k, k)\ is finite. The distribution at time t of the continuous-time 
Markov chain generated by Q with initial distribution /i is 

Define ||;U — ttHtv to be the total variation distance between measures /i and 
vr on Q. We will also need a stronger norm to measure the distance between 
distributions on the nonnegative integers with finite means. We will write 
ll/i — vr||(x) for the first moment of |^ — 7r|, 

IIm- 7r||(i) =^A;|^(/c) -7r(/c)| > ^ ^ |/i(/c) - 7r(^)| = ||/i-7r||TV- 
k ^ k 

The Kullback-Leibler divergence from to vr, also called the relative entropy 
of fi with respect to vr, is defined up to a multiplicative factor by 

^KL(^||vr)=^MA^)log^. 

We will take the function log to be the natural logarithm. Define a function 
(/):(0,oo) X [0,00) ^M, 



0(2;, y) =2/logy-a;loga::- {y - x) 



^tlogi \t=x 



ylog- - (y-x). 

X 



li X = y, (p{x,y) = 0. By the convexity of 1 1— > tlogt, > for all x, y. 

The function is bounded above: 4>{x,y) < {x — y)^/x. We can write 

k 

Let and ly be probability distributions on the extended real number line 
MU {— 00, +00}. We will say that /i is stochastically smaller than i/, written 
<st if the following equivalent [11] conditions are met: 

(i) For ah c G M, n{- > c) < u{- > c). 

(ii) There is a coupling (X,Y) such that X has distribution /x, Y has 
distribution i^, and X <Y almost surely. 

We will make use of asymptotic notation: "order less than," O; "order equal 
to," 6; and "order greater than," 0, to describe the asymptotic behavior of 
functions. With ci, C2 and uq positive constants, 

ifV?i>no, |/(?^)|< C2|5'(n)| write /(n) = 0(5(n)) as n — > 00, 

ifV?i>no, ci\g{n)\ <\f{n)\<C2\g{n)\ write /(n) = 0(gi(n)) as n —> 00, 

ifV7T,>no, ci\g{n)\ <\f{n)\ write f{n) = Q,{g{n)) as n ^ 00. 

Likewise, f{x) = 0{g{x)) as x — > if /(x) = 0{g{x)) as 1/x — > 00. We will 
write /(m, n) = Q{g{m)h{n)) as n ^ 00 if: 
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(i) for all m, f{m, n) = Q{h[n)) as n ^ oo, 

(ii) we can take the implicit constants ci = Q{g{m)) and C2 = 0{g{m)). 

For example, the number of balls NR = Q{N) as N ^ oo, but we can also 
write NR = Q{NR) as — > co to indicate the dependence on R. 

3. Micro canonical ensembles. The Gibbs distribution is central to the 
study of statistical physics in the discrete setting [5, 12]. Consider a sys- 
tem whose state space is a countable set. Suppose that the system has 
an energy function, or Hamiltonian, : — > M. The Gibbs distribution at 
inverse-temperature /? is defined by 

-l3H(k) 



A microcanonical ensemble is a collection of N copies of the system, with 
states, say, 6i, . . . , ^at € O. Canonical ensembles were used by Maxwell, Boltz- 
mann and Gibbs to develop the theory of thermodynamics. This has aroused 
some interest in Markov chain models for microcanonical ensembles, such 
as the Ehrenfest model [4]. In Boltzmann's i/-theory, entropy always in- 
creases. However, the universe is apparently a time-reversible system. Re- 
versible Markov chain models have a seemingly paradoxical property. By 
ergodicity, any function of the system, such as entropy, that goes up must 
also come down. 

A microcanonical ensemble evolves with time in such a way that the total 
energy, 

N 
i=l 

is conserved. The basic assumption of statistical mechanics is that at equi- 
librium, all compatible configurations are equally likely. Define C(£'tot) to 
be the set of configurations on the ensemble compatible with total energy 

-E'tot) 

C(^tot) = {{bi,.. .,bN) :£tot = Etot}. 

In the context of probability theory, the assumption is that the ensemble is 
an irreducible Markov chain on C(£'tot), and that the equilibrium distribu- 
tion is uniform. 

Let = \{i:bi = k}\ be the multiplicity in the ensemble of state k. With 
Xfc = N^i/N , the empirical distribution is X = [X^ :k The multinomial 
coefficient -/Vl/Ofc-^fc' counts the permutations of the ensemble. Under the 
equilibrium measure 7r(- | Stot = -E^tot), 

(3.1) ^(X = x|£:tot = ^tot) = |C(^tot)r'7^^ r, Xk = nk/N. 
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Define S to be the information-theoretic entropy of a distribution p on i}, 

^(P) = -^P{^) logp(A;). 
fcen 

Assume for now that is finite. The entropy of the empirical distribution 
is related to the thermodynamic entropy log[A'^!/nfc -^fc!]- The difference be- 
tween the entropy of the empirical distribution, and thermodynamic entropy 
divided by N , vanishes: 



(3.2) 



1 N\ 



as ^ oo. 



This is simply by Stirling's approximation, 

logn! = nlog(n/e) -|- O(logn). 
Now take inverse-temperature (3 such that G/3 has energy E, 

E = Gp{7i) = Y.Gp{k)'H{k). 

k 

For all distributions p that also have energy p(7i) = E, 

(3.3) S{Gp) - Sip) = DKL{p\\Gf3) =Y,<t^{Gp{k),p{k)) > 0. 

k 

Therefore G/j is the maximum entropy distribution. By (3.1) and (3.2), in 
the limit as A — > 00, almost all of the equiprobable ensemble configurations 
correspond to values of the empirical distributions close to G/3. Therefore, 
by symmetry, 

lim 7r({6i = k} I £tot = NE) = Gp{k). 

Let us return now to the MFZRP specifically. It is supported on configura- 
tions with NR balls in total, 



The MFZRP therefore is a microcanonical ensemble. The geometric distri- 
bution G^ from Section 1 can be thought of as a Gibbs distribution on state 
space i7 = N with respect to a linear energy function, say 

n-.n^R, n{k) = k. 

The "support" of the empirical distribution, the set {fc G N : A^ > 0}, has size 
0(\/ NR). Therefore, the limit (3.2) still holds. Increasing the temperature 
(decreasing the inverse-temperature (3) corresponds to increasing the average 
number of balls per box. 
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4. The Ehrenfest model. The Ehrenfest model was introduced in [4] to 
demonstrate that a system could be ergodic over one time scale, yet still 
appear irreversible over shorter time periods. The original model has inspired 
a number of trivial and nontrivial variants. For convenience, we will consider 
a continuous-time version. Start with N fair coins in a row. At rate N, pick 
a coin uniformly at random and toss it. This is a Markov chain on the set 
{H = heads, T = tails}^. The Markov chain is time reversible with respect 
to the uniform distribution. 

The Ehrenfest model is a microcanonical ensemble with respect to the 
set n = {H,T} with Hamiltonian n{H) = n{T) = 0. Let Xh, Xt be the 
fractions of coins with heads, tails side up, respectively. The entropy of the 
system is 

S{Xh, Xt) = -Xh log Xh - Xt log Xt. 

Suppose we start with the coins all tails side up. This is a highly ordered 
state, 5 = 0. At equilibrium, the expected value of Xh is 1/2. The expected 
value of the entropy is close to the maximum entropy 5max = log 2. 

A natural way to study the evolution of the system is to take the fluid 
limit. Condition on X^it) = xh and XT{t) = xt- Xh increases by 
at rate NxTf^', it decreases by at rate Nxh/2. The expected change 
dXn over a time period dt is {{xt — xh) /2 + 0{dt)) dt. The fluid limit is the 
solution {xh,xt) to the differential equations 

dxH Xt — Xh dxT xh — xt 
~~dr~ 2 ' ~dr~ 2 ' 

with boundary conditions xh{0) = 0, xt{0) = 1. Up to any fixed time, the 
empirical distribution {Xh,Xt) converges in probability to {xh,xt), 

XH{t) = (1 - e-*)/2, XT{t) = (1 + e-*)/2. 

The entropy of the fluid limit is initially 0. Maximum entropy Smax = 

log 2 

is obtained in the limit as t — > oo with xh = xt = 1/2. 

The fluid limit suggests Xh = l/2 + 0{l/^/N) after 0(log7V). The Markov 
chain is a symmetric random walk on the Hamming cube. The mixing prop- 
erties have been studied in great detail [3, 10, 17]. The system does mix in 
time O(logiV). 

In the case of the Ehrenfest model, the components of the ensemble are 
evolving independently. The fluid limit is just the probability distribution 
of a single coin. Each coin is a flnite (time-homogeneous) Markov chain, 
so it is immediate that the fluid limit converges exponentially in Kullback- 
Leibler divergence [3]. We will see in the next section that the fluid limit 
of the MFZRP is also the distribution of a Markov chain, but a time- 
inhomogeneous one. 
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5. The MFZRP fluid Umit. The empirical distribution is the Markov 
chain X(t) = {Xq, Xi, . . .). If a box has k balls at time t, it contributes mass 
to Xfc(t). Unless A; = 0, the box loses balls at rate 1. The box receives 
balls from each of the nonempty boxes at rate A'^"^; the combined rate is 
1 — Xo(t). Initially all boxes have R balls in: X(0) = Sr. The fluid limit x(t) 
is the limit in probability of the empirical distribution X(t) as — > oo. In 
this section, we describe a time-inhomogeneous Markov chain that provides 
an intuitive interpretation of the fluid limit. 

In the limit, boxes start with R balls, lose balls at rate 1 and gain them 
at rate 1 — XQ^t). The rate at which any pair of boxes interact goes to 
zero; it is as if each box is interacting with a "mean field" that depends 
on time, through xo(t), but not on the box's occupancy. Define a time- 
inhomogeneous Markov chain C on N. Start at R at time 0: let C(0) = R. 
Stepping left, decreasing C by 1, corresponds to a box losing a ball. Stepping 
right, increasing C by 1, corresponds to a box receiving a ball. If at time 
t, C{t) > 0, step left at rate 1. Step right at rate 1 - P(C(t) = 0). The 
distribution of C{t) is exactly the fluid limit x(t); the rigorous justification 
for this comes in Section 10. 

The Markov chain C is controlled by P(C(t) = 0) = XQ{t). Key to show- 
ing that the fiuid limit rapidly approaches is a related class of time- 
homogeneous Markov chains. Let a G [0, 1] . Call the following walk BRW[N, a] ; 
it is a random walk on N with bias a. Step left, unless at 0, at rate 1. Step 
right at rate 1 — a. The transition rate matrix Q = (5[N, a] is specified by 

fl, k = j-l, 

(5.1) Q{j,k) = ll-a, fc = j + l, 

io, k^{j-l,j,j + l}. 

Let a G (0,1). The Markov chain BRW[N,a] is irreducible. The stationary 
distribution is 7r = 7r[N, a], 7r{k) = a(l — o)'^. Of course, BRW[N, a] is re- 
versible with respect to tt; for all j,k, '^ij)Q{j,k) = 7r{k)Q{k,j). 

Extending our notation, one can describe the inhomogeneous Markov 
chain C as BRW[N, xo(t)], 

(5.2) ^x(t)=x(t)Q[N,rEo(t)], x(0) = 5jj. 
at 

The entropy of the fluid limit x = x{t) is 

S{x) = -Y,^klogXk, 01ogO = 0. 

fc>0 

The distribution associated with energy R with maximum entropy, the 
Gibbs distribution, is the geometric distribution G^. By (3.3), the Kullback- 
Leibler divergence from x to G^ is the difference between the entropy and 
the maximum entropy, 

Z)kl(x||G^) = 5(G^)-5(x). 
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Lemma 5.1. The unique stable point with mean R of the differential 
equation (5.2) is :x. = G^. For t> 0, the rate of increase of the entropy is 

dt f^^^ Xk+i 

(5.3) 

^ (Xfc(l -Xq) -Xfc+l)^ 

~ ^max{j;fe(l -xo),Xfc+i}' 

As physical intuition demands, the entropy is increasing. Further, 5'(x) is 
strictly increasing unless x = G^. Showing that the fluid hmit converges 
to G^ in Kullback-Leibler divergence is equivalent to showing that S'(x) 
increases to S{G^). 

To prove that Dkl = Dkl{^\\G^) decays exponentially, we must show 
that -Dkl and dS/dt = —d{DKL)/dt have more or less the same order. We can 
measure the Kullback-Leibler divergence from x to geometric distributions 
other than G^. 



Lemma 5.2. Let a£ (0,1). The Kullback-Leibler divergence from x to 
7r[N, a] is minimized by 7r[N,a] = G^ , that is, when a = l/{R+l). 

Taking o = xo, L'KL(x|k[N, xq]) > Dkl(x||G«): 

(5.4) J2 log = E <^(^o(l - xo)^ xfc) > Z?kl(x||G^). 

k>i ^^^^ fc>l 

In order to compare the summations in (5.3) with the summations in (5.4), 
we need to consider a truncated, finite version of the biased random walk. 
We also need to obtain certain bounds on the fluid limit. 



Proof of Lemma 5.1. Define ruk = Xk{l — xq) — Xk+i to be the flow 
from Xfc to Xk+i- For convenience let m_i = 0; no boxes have —1 balls. The 
kth coordinate of the fluid limit increases as mass flows from x^-i to x^, 
and decreases as mass flows from Xk to Xk+i, 

^Xk = ruk-i -ruk, k£ N. 
dt 

Note that the mass, 1, and mean, R, are conserved, 

|E^fc = 0' JtT.kxk=0. 
If dx/dt = 0, then = for all k; x is a geometric distribution. 
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Differentiating tlie entropy, 



Tlie first line in (5.3) follows as J2k>o''^k = 0. The lower bound on dS/dt 
follows using 

loga-log6 1 

7 > i—rr, a,b>0. 

a — b maxja, o| 

Apply the mean value theorem to x i— > log(x) on the interval from a to b. 
□ 

Proof of Lemma 5.2. If Xk = a(l — a)^' for all k, then x has mean 
(1 — a)a~^ and entropy 5(x) = — log a — (1 — a)a~^ log(l — a). By definition, 

Z)KL(x||G^) = ExfclogXfc/Gf, Gf = i?V(^ + 

Using J2xk = ^ and J2 = R, 



Z)KL(x||G^)=log 



a(l — a) 



R 



+ L)KL(x||7r[N,a]). 



l/{R + l){R/{R + l))R 
For a G [0, 1], a(l — a)^ is maximized by a = l/(i? + 1). □ 

6. Markov chain convergence. Let be a finite set. Let Q be the tran- 
sition rate matrix for an irreducible Markov chain, and assume that the 
Markov chain is time reversible with respect to equilibrium probability mea- 
sure TT. It is standard [17] that the eigenvectors of Q, suitably normalized, 
can be turned into an orthonormal basis for M^. We will work with the left 
eigenvectors; they are orthogonal with respect to inner product 

ken 

Label the eigenvalues = Ai > A2 > A3 > • • • > A„ with corresponding nor- 
malized eigenvectors Fi, . . . , The first eigenvector Fi = tt, the stationary 
distribution. For a probability measure ^ on fi, 

H{k) =7r{k) +^aiFi{k), ai = {n, Fi)^. 

i>2 

The distance from ^ to vr can be written in terms of these coefficients, 
ken ^^^> i>2 
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The quantity IA2I is called the spectral gap. Let nt = /^e*'^. The eigenvalue 
of Fi with respect to e*'^ is e*"^' , so 

(6-1) Ak) Ak) ■ 

The uniform time to convergence within e in total variation is 

Ti(e) =inf{t :V/U, - vtHtv < e}- 
Inequality (6.1) can be used to bound the total variation convergence time, 

(6.2) |<5,-e*«(A:)-7r(A;)|<e*^^y'7r(A;)/7r(i), 

(6.3) ll/it - ttIItv < e*^V7i"min, TTmin := mm7r(a;). 

Log Sobolev inequalities for finite Markov chains are described in [3]. Define 
the Dirichlet form, and the Laplacian for /, (7 : ^ M, 

£{f,9) = lj2''ij)Qij,k)[fU) - fik)Mj)-9ik)], 



£(/) = 5^^(A:)/(A:)2log 



(mi_ 

\\\ #l|2 



The log Sobolev constant is defined 



a = mm| ^^1^ :£(/)/ 



The log Sobolev constant can be used to show convergence in Kullback- 
Leibler divergence, 

(6.4) i-Z)KL(;Ut||^) = f^,log^') < -4aZ?KL(^t||vr). 



Corollary A. 4 of [3] gives a bound for the log Sobolev constant in terms of 
the spectral gap and VTmin, 

(6.5) a> 



log(l/7rmin - 1) 



7. Truncating the biased random walk. We will now consider versions of 
the biased random walk on sets other than N. For n > 2, let BRW[n, a] be 
the biased random walk restricted to the set {0, 1, . . . , n — 1}. The transition 
rate matrix (5[n,a] is still specified by (5.1). Unless at the left boundary, 
0, step left at rate 1. Unless at the right boundary, n — 1, step right at 
rate 1 — a. If a G (0,1), the stationary distribution vr = 7r[n, a] is given by 
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TT{k) = a(l — a)^/{l — (1 — a)"). The stationary distribution is uniform if 
a = 0. 

Now let a G (0, 1). Consider the inequalities for finite Markov chains from 
Section 6. Inequality (6.4), containing the log Sobolev constant, becomes 

(7.1) E [Mfc)(l - «) - Kk + 1)] log ^^^.f > 4a X: Kk) log 

The spectral gap inequality (6.1) becomes 

"^^ [Mfc)(l-a)-Mfc + l)F -.| . "^ MMfc) - vr(fc)F 

We see in the next section. Lemma 8.4, that IA2I > a^/4. The bound (6.5) 
on the log Sobolev constant, Corollary A. 4 of [3], gives 

(l-27r(n-l))|A2| , . 

a > , , , -T — TT = Q{a n) as n ^ 00. 

log(l/vr(n - 1) - 1) 

To compare (5.3) with (5.4), we would like to set ^{k) = Xk in the above 
inequalities. However, /i must be a probability distribution; the restriction 
of X to the set {0, 1, ... ,n — 1} is only a probability distribution \i n> R 
and t = 0. We can adapt these inequalities to cope. 

Lemma 7.1. Let he a probability distribution with support N. Let a he 
the log Sobolev constant o/BRW[n,a], 

"EV(A:)(l-a)-MA: + l)]log-^^^^^^""^ 



Kk + l) 

> 4a m log + iog(i _ (1 



Lfc=0 



Lemma 7.2. Let ^ be a probability distribution on N. If ^{Q) = a, 

[/i(fc)(l -a)- /.(fc + 1)]^ ^ a^ Hk) - ajl - g)^]^ 
t'o a(l-a)'^+^ " 4 ,tl a(l-a)'= 

In Section 9, we will use Lemma 7.1 to see that -Dkl does indeed become 
small fairly quickly. However, as Dkl decreases we must increase n, so our 
bound on a tends to zero. The lower bound on the log Sobolev constant is 
too weak to allow Lemma 7.1 to be used to prove Theorem 1.2. We complete 
the proof of Theorem 1.2 using Lemma 7.2. Once -Dkl is small, the inequality 
can be used to show that logDxL decreases with rate order R^^. 
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The lower bound on the log Sobolev constant for BRW[n, a] might seem 
pessimistic, but it does in fact have the correct order. Let /i be the uniform 
distribution on {0, 1, . . . , n — 1}, and start BRW[n, a] with initial distribution 
fi. By inequality (6.4), 

a<-——^----^DKLif^t\\TT), /xt = ^e*Q["'''l, 7r = 7r[n,a]. 
4£>KL(^t||7r) dt 

We can apply the above inequality at t = in the limit as n — > oo. Written 
in terms of the entropy and mean of /x^, the divergence -DklC/U* ||vr) is 

-S{fit) - log ^ _ ^ °_ - log(l - a) E kMk)- 

At t = 0, these three terms are 0(logn), 0(loga) and 0(nlog(l — a)), re- 
spectively: the divergence is 0(nlog(l — a)). Now consider the rate of change 
of the divergence. The entropy 5 is maximized by the uniform distribution, 
so ^S{^t) = at t = 0. The second term is constant. The semigroup e**^ 
is pulling the distribution toward zero at net rate 1 — (1 — a) = a; the rate 
of change of the third term is — 0(alog(l — a)). Therefore a = @{a/n) as 
n — > oo. 

Proof of Lemma 7.1. The support of fi is N, so Y.j<nl^ij) > 0. Let 
u{k) = fJ-ik)/ J2j<n f^i-U) fo'^ k <n. Substitute z/ into inequality (7.1). □ 

Proof of Lemma 7.2 . We will make use of a discrete version of Hardy's 
inequality [14]. Let u and v be positive functions on N. The inequality states 
that 

2 



oo ,00 /J \ ■ 

j=0 j=0 \k=0 / 



J 

where 

/ k 




B := sup( — — 

fc>o \~'qV[.3) 

Let u{k) = v{k) = a(l — a)^"''-'^ for A; e N; this gives B = a~^. The result 
follows by taking 

and /(/c) =0 for > n- 1. □ 



It is also helpful to consider the effect of truncating BRW[N, a] from the 
other side. Let BRW[N + A;, a] be the walk with bias a on the set {k,k + 1, A; + 
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2,...}. The equilibrium distribution, 7r[N + fc,a], is simply the equilibrium 
distribution of BRW[N,a] shifted k to the right, 

7r[N + k, a]{j) = 7r[N, a]{j - k), j = k,k + l,.... 

This provides a very simple stochastic bound on BRW[N,a]. 

Lemma 7.3. BRW[N,a] started at k is stochastically smaller than the 
equilibrium distribution of BRW[N + k, a] ; 5fce*'^[^'"l <st vr[N + /c, a] . 

Proof. This follows by coupling. Construct walks X~BRW[N, a] and 
Y ~ BRW[N + k,a] on the same probability space as follows. Let X(0) = k 
and choose 1^(0) according to the distribution 7r[N + k,a]. 

Now introduce two Poisson processes, one with rate 1 and one with rate 
1 — a, to run X and Y for t > 0. The rate 1 Poisson process corresponds to 
steps to the left. With each arrival of the rate 1 process: 

(i) decrement X by 1 unless X = 0, 

(ii) decrement y by 1 unless Y = k. 

If X < y immediately before an arrival, then X <Y after the arrival. The 
rate 1 — a Poisson process corresponds to steps to the right. With each arrival 
of the rate 1 — a process, increment both X and Y by 1. This also preserves 
X <Y. At t = 0, X <Y so P(Vt, X{t) < y(t)) = 1. □ 

8. Bounds on the fluid limit. To apply Lemmas 7.1 and 7.2, we need 
to bound certain functions of the fluid limit. The class of Markov chains 
BRW[N, a] has a stochastic ordering property. Let a,b £ [0, 1]. Define Markov 
chains Ca and Cf, that evolve according to BRW[N,a] and BRW[N, 6], re- 
spectively. Let fia be the initial distribution of Ca and let Hb be the initial 
distribution of Cb- 

Lemma 8.1. If a<b and fia >st fJ-b, then there is a coupling {Ca,Cb) 
such that Ca > Cb with probability 1. 

Fix s > 0. Suppose XQ{t) G [a, 6] for t>s. The distribution of the time- 
inhomogeneous Markov chain BRW[N,xo(t)], the fluid limit x, can be com- 
pared to the distributions of the time-homogeneous Markov chains. Start Ca 
and Cb at time s, both with initial distribution x(s). Let fia and fib be the 
distributions of Ca and Cb, 



/ia = l^a,t = x(.)e(*-)«[^''^l , f,b = m = x(s)e(*-)«[^'^] . 



16 B. T. GRAHAM 

Lemma 8.2. Fort>s, /U^ <st x(t) <st /U^. // l/(5ii) < a < l/(i? + 1), 
(8.1) -log||^a,t-^[N,a]||(i) = 17(i?-2(i_s)) as{t-s)^oo. 

Similarly for fj,{y if 1/{R+ 1) <b< 4/5. 

Recall that || • ||tv ^ II ■ ||(i)- Lemma 8.2 allows us to bound 

Xk < fia{Ca >k)- Hb{Cb >k + l) 

(8.2) 

< (1 - a)'' - (1 - 6)'=+! + Wfia - ^[N,a]||TV + Wl^b - ^[N,6]||tv. 

We will apply Lemma 8.2 iteratively, with bounds a and b improving as 
s ^ oo. As X tends toward in Kullback-Leibler divergence, xq tends 
toward G^(0). By Pinsker's inequality [1], 

(8.3) (xo - G^(0))' < ||x - G^ll? < 2Z)kl(x||G^). 

Initially, however, we will take a and b as follows and s = sq. 

Lemma 8.3. For t> so = 0{R'^ log R): 

(i) xo{t)>a=l/i5R), 

(ii) xo(t)<6 = l-(i? + l)-0(i), 

(iii) x(t) <st7r[N + i,l/(5i?)] with j = 0{R\ogR). 

Now to prove the above. The simplest way to calculate (8.1) seems to be 
via the truncated version of the biased random walk. 

Lemma 8.4. For a G (0, 1), BRW[n, a] has spectral gap IA2I > a^/4. For 
all k = 0,1,2, ... ,n — 1, 

A -ta'^/4 

||&.'«-l-,[n,al||„<-5^^-^^ 

Proof of Lemma 8.1. As Ha >st IJ-b, we can choose Ca{0) > Cb{0). The 
property Ca > Cb is preserved if we run the two Markov chains as follows. 
At rate 1, decrement both Ca (if Ca > 0) and Cb (if Cb>0)- At rate 1 — 6, 
increment both Ca and Cb- At rate b — a, increment only Ca- □ 

Proof of Lemma 8.2. The stochastic bounds follow as in Lemma 8.1; 
the fluid limit x evolves according to BRW[N, xo(i)]. 

The II • 11(1) mixing-time for BRW[N,a] started at fx, the least t such 
that ||/ie*Q - 7r[N, fl]||(i) < very much depends on /i. The inequality in 
Lemma 8.4 is uniform in n. We can let n tend to infinity. 



||5,e*«[^''^l -^[N,a]||(i) < — -j^^O as t 



a2(l-a)^V2 



00. 
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However, with t fixed, the above bound grows exponentiaUy in k. When k 
is large, it is better to use Lemma 7.3, 

,e^Q[^M_^l^^a]\\,,.<{k + a-')+n-^ 



(1) 



a 



By hnearity and a triangle inequality, 

r _i 4e-(*-^)«V4 ^ 

||Ma-vr[N,a]||(i) <^2;fc(s)min|A; + 2a , ^^(i _ a)fc/2 J" 

The bound (8.1) now follows by applying the stochastic upper bound on 
x(s), Lemma 8.3(iii). □ 

Proof of Lemma 8.4. Let Fi, . . . , F„ be the left eigenvalues of Q, nor- 
malized as in Section 6. The first eigenvector Fi is the equilibrium distribu- 
tion vr; the corresponding eigenvalue is 0. The other n — 1 eigenvectors of the 
Markov chain can be written as follows. Let Aj = \/l — aexp(i7r{j — l)/n) for 
j = 2, . . . ,n. Then with Cj a normalizing constant, Fj{k) = Cjlm[{l — Aj)A^]. 
Check 

{F^Qm = -Fj{k)\l - , j = 2, . . . ,n. 
Therefore for j >2, the jth eigenvalue is 

\j = -\1-Aj\^ < -aV4. 

By (6.2), 

|4e*Q(^) - 7r{e)\ < e-"^'l\\ - af-''^l\ 
Multiply by and sum over ^ = 0, 1, . . . , n — 1. □ 

Proof of Lemma 8.3(i). Let n = 2R. We can define an evolving prob- 
ability distribution, 

y={yk{t)) on {0,l,2,...,n-l}U{+cx)} 

with the following properties: 

(i) x(i) <st y{t) for ah i > 0, so xo{t) > yo{t), 

(ii) yo{t) > I /(BR) for t>so = 0{R'^logR). 

Apply Markov's inequality to the fluid limit, 

oo n—1 

^kxk = R so ^Xfe>l/2. 

fc=0 A;=0 

Define probability distribution y(0) by y„,_i(0) = 1/2 and yoo(O) = 1/2. Let 
y be the restriction of y to {0, 1, . . . , n — 1}: y = {yk)^=l)- Define y(t) as 
follows: 
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(i) For all t, let yoo{t) = 1/2. 

(ii) Let Q be the transition rate matrix for BRW[n, 0]; let y(t) = y(0)e*'^. 

We have x(0) <st y(0) by the choice of y(0). If y was being acted on 
by BRW[N,0], x(t) <st y{t) would follow by a proof similar to that of 
Lemma 8.1. However, y is being acted on by BRW[n,0]. We must show 
that as mass of x at n — 1 (that is coupled to mass of y at n — 1) flows 
to n, we can modify the coupling suitably. The inequality '^k<n^k ^ 1/2 is 
preserved: for any mass of x (coupled to mass of y at n — 1) flowing to n, 
there must be some mass of x below n that is coupled to mass of y at +oo. 
Using this slack, an exchange can be made in the coupling. 

The equilibrium distribution of BRW[n, 0] is uniform. The mass of y 
is 1/2, so yo (l/2)ri"i = l/(4ii). To see that yo > l/(5ii) after time 
0{R^logR), apply (6.3). The spectral gap of BRW[n,0] is given by setting 
a = in the proof of Lemma 8.4, 

IA2I = |1 - ^2!^ > 4n"^ Aj =exp{iiT{j -l)/n). □ 

To show Lemma 8.3(iii), we must bound how far x shifts to the right before 
the lower bound xq > l/(5i?) is in effect. 

Lemma 8.5. Let X = Y — Z with Y, Z independent Poisson(so) random 
variables. Let u he given by v^k) = F{X G {k — R,k + R+1}). Then x(so) <st 
I/. 

We will need to apply a concentration bound to u. 

Theorem 8.6 ([13], Theorem 2.7). Let X be the sum of n independent 

random variables: X = X(l) H \- X(n). Let Var(X) be the variance of X. 

Suppose X{i) — W,{X{i)) < 1 for each i. For any A > 0, 

F(X>Em + A)<exp( ^^^.^^f^^^^3 ). 

Proof of Lemma 8.3(iii). If x(so) <st vr[N + k,l/{5R)], then for all 
t > So, x(t) <st 7r[N + k, 1/{5R)]. We can use the concentration bound. The- 
orem 8.6, to show 1/ <st vr[N + J, l/(5i?)] with j = 0{RlogR). The Poisson 
distributions Y and Z can be approximated by binomial distributions (which 
in turn can be written as the sums of Bernoulli random variables). □ 

Proof of Lemma 8.5. By Lemma 8.2 with a = 0, x is stochastically 
smaller than the walk BRW[N, 0] started at R. We can use a reflection 
principle to calculate the distribution u = 5fie^°^^^''^\ Let C be a random 
walk on Z that starts at R, then steps forward at rate 1, and backward at 
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rate 1; in keeping with our notation, the walk is BRW[Z,0]. Consider the 
reflection of C in the point —1/2: —1 — C start at —1 — R and then also 
evolves according to BRW[Z,0]. The law of max{C, —1 — C} is exactly the 
distribution of the random walk BRW[N, 0] started at R. □ 

Proof of Lemma 8.3(ii). This follows from part (iii). Choose n and b 
such that 

^ A:7r[N + j, a]{k)<R, 6 = 1 - ^ vr[N + j, a] (k). 

k>n k>n 

Suppose for a contradiction that xq > b. It is then impossible to find xi,X2, ■ ■ ■ 
in [0, 1] such that 

'^Xk = l, ^kxk = R and (xq, xi, X2, • • •) <st vr[N + j, a]. 

k>0 k>0 

Therefore xq < b. We can take n = 0{R\ogR), so the result follows. □ 

9. Convergence of .Dkl = -Dkl(x||G"'^). Now that we have bounds on 
the fluid limit, we can show that .Dkl decreases rapidly. As soon as we have 
the lower bound xq > l/(5i?), we can use Lemma 7.1. 

Lemma 9.1. Ast^oo, 

DKL{t) = DKL{0)exp{-n{R-^Vi)), Dkl(O) = 0(logi2). 

This eliminates the possibility that I?kl gets "stuck" some distance away 
from zero. Once Dkl is sufficiently small, we can use Lemma 7.2 to prove 
Theorem 1.2. 

Proof of Lemma 9.1. As discussed in Section 7, the log Sobolev con- 
stant of BRW[n,a], a = e(a/n). If (I-xq)" < 1/2, by the first line of (5.3), 
(5.4) and Lemma 7.1 with = x and a = xq, 



> eixo/n) 

at 



Dkl - E log - 2(1 - xqT 



If we can find n = n(L'KL) such that, say, 

(9.1) 2{l-xor + J2xklog—-^^—^<DKL/2, 

then 

(9.2) j^S = ^(-^kl) > eixo/n)DKL. 
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We must let n — > oo as -Dkl ^ to maintain inequality (9.1). However, the 
quicker n grows, the weaker the bound (9.2) becomes. 

To apply Lemma 8.3, assume t > sq; let a = l/(5i?), b = l — {R + l)^*-'(i) 
and j = 0{RlogR). We then have xq e [a, b] and x <gt 7r[N + j, a]; 

< 2(1 - a)*" + E ^[N + j, a] (k) [- loga-k log(l - 6)] 

k>n 

< 2(1 - a)" + (1 - a)""-'' [- logo - (n + a^^) log(l - b)]. 

Of course, 1 — a < exp(— a). Taking n = 0(— i^logDxL) above, we satisfy 
inequality (9.1). Putting n = 0{-RlogDKL) and xq = il{R~^) into (9.2), 

^(-logI)KL)>e(fl-V(-log^KL)). □ 

Proof of Theorem 1.2. We start off by preparing a "daisy-chain" of 
inequalities, using (5.3), (5.4), Lemma 7.2 and the function cj): 

fi\ d Q _ d f_D \ ^ {xk{l-xo)-x^.+i)'^ 

df^ ~ ^KLJ ^ 2^k=0 max{xfe(l-xo),a;fc+i}' 

(■■\ {xk{l-xo)-Xk+i)^ {xk{^-xo)-x^,+l)^ .r xo(l-j:o)'' 

l^ij l^k=0 max{xfe(l-xo),Xfe+i} - ^fc=0 xo(l-xo)'=+i ^^^^fc<n Xfc ' 
r^in (a;fc(l-a:o)-Xfc+i)^ ^ ^^2//|^v^r^-l (xfc-xo(l-xo)'')^ 

l^ii; Z^fc=0 xo(l-a;o)''-+i - '^•^O/^J 2^k=l xoCl-xo)'' ' 

(iv) E^=^^^^^^&g^>E^=lV(xo(l-xo)^x.), 

(v) E^=l0(^o(l-3:o)^a;fc)>I)KL. 

Joining these together, 

tA-^kl) > 4 (dkl - E <^(^o(l - xo)^x,)') inf ^^il^^. 

4 \^ ; /c<n Xk 

The result now follows if we can choose n = n(Z)KL) such that as -Dkl 0, 

(I) ^E'^(^o(l-xo)^Xfc)^0, (II) inf ^^^i^^ > 1/2. 

Dkl k<n Xk 

We need n large for (I), but not too large or (II) might fail. 

For £ = 1,2,..., let q = exp(— exp(£)). We will use the sequence (q) to 
measure the rate of decrease of D'kl- As -Dkl is monotonically decreasing, 
we can let si be the unique time at which -Dkl = Q- We have chosen (q) to 
be decreasing doubly exponentially, so we must show that (s^) grows only 
exponentially. 
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Consider the time period Si to s^+i. Using inequality (8.3) with Drl < Qj 
we can find a and h such that xq G [a, 6] from onward: 

(9.3) --flTT-^' *=flTT + ^- 

Let Ha and yU;, be defined as in Lemma 8.2 with s = se. To take advantage of 
the bounds /i^ <st x <st fj-a, we must wait until Ha and fib are close to their 
respective equilibrium distributions. We will split the time period 
into two parts: where we wait for fia and fib to mix and 

where (I) and (II) hold with n = [log(Q+2)/log(l — a)] . 

By inequality (8.1), from time := S£ + 0{R^e^) onward, the bound fia 
satisfies (and similarly for fjH)) 

(9.4) ||;u,-7r[N,a]||TV< ||Ma-vr[N,a]||(i) <c|. 
Combining (9.4) with x <st /Ua allows us to bound the tail of x, 

^ /cxfc < ^ kii{k) 

k>n k>n 

< ^A;7r[N,a](A;) + ||/i„-7r[N,a]||(i) 

k>n 

< (n + a-^)(l-a)" + c|. 
Let t S By the definition of cf), 

Xk 

Xk iog — 

k>n k>n 



V 0(xo(l - Xo)'', Xk) = y2^k log rr - (xfc - Xo{l - Xq)'') 



<{\-aT^Y. a;fc[-loga - A:log(l - h)\. 

k>n 

As (1 — a)" < Q+2 ^ Q+1 < -DkL) this implies (I). By inequality (8.2) 
xo(l-xo)^^ a(l-6)'= 



Xk {l-a)k -{l-b)k+^ + 2cj' 

Implicitly, n is a function of i. By (9.3), in limit as ^ cxd, 

sup(l - o) V(l - &)'' ^ 1 

fc<n 

50 (II) holds. For £ sufficiently large 

This gives s^+i — s'^ = 0{B?e^) and therefore s^+i — S£ = 0{E?e^), too. Hence 

51 = 0{R^e^). □ 
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10. Convergence to the fluid limit. We now prove that the empirical 
distribution converges to the fluid limit. 

Proof of Theorem 1.1. We will use Theorem 2.2 from [7]. For G 
N, the empirical distribution is a pure jump process in In = (N/iV)^. Let 
vectors eg = (1, 0, 0, . . .), ei = (0, 1,0,...),... be the canonical basis for M^. 
The Levy kernel of X is defined for distributions x G /^v with mass ^^fc = 1 , 

{^,dy) = Nj2Y.(^i^j - ^^^^7p^)'5[e,_i/iV-e,/JV-e,/iV+e,+i/7V]- 
j>Oj>0^ 

The formal "limit kernel" is 

~^ i>Oj>0 

The limit kernel encodes the fluid limit differential equation (5.2), 
|x(i)=fe(x(t)), 6(x):= JyK{^,dy). 

We must first show that the fluid limit differential equation has a unique 
solution. Consider the set S = {x G [0, 1]^ : J2k ^ 1} of "subprobability" 
distributions. Equipped with the sup-norm, 5 is a complete metric space 
and 6 is a Lipschitz function on S. 

Now fix T > 0. To apply the fluid limit theorem, we must bound the tails 
of the fluid limit and the empirical distribution for t £ [0, T] . Let fit be the 
distribution of a random walk on N that starts at R, then steps forward at 
rate 1. This is a Poisson arrival process; the number of steps taken forward 
is Poisson(i). By a slight modification to Lemma 8.2, 

x(t) <st fJ-t <st fJ-T- 

The empirical distribution has an analogous bound. Recall that Bi, . . . ,B]\f 
are the numbers of balls in each box. Let Ci, . . . ,Cn- be independent 
copies of the Poisson arrival process, each starting Ci{0) = R. There is a 
coupling (B, C) such that Bi < Ci for all i: every time box i is picked as the 
sink box in the MFZRP, increment Cj by one. 

Let d G N, d > i2. Let p{d) be the probability that Ci{T) > d. The number 
of boxes with occupancy exceeding d at any point up to time T is stochasti- 
cally smaller than binomial Yim.{N ,p{d)). By Markov's inequality, p{d) 
as d^oo. By concentration, Theorem 8.6, the probability that ^k>d-^k 
exceeds 2^^ at any point up to time T decays exponentially in A^. 

The proof of Theorem 2.2 [7] considers truncated, finite-dimensional ver- 
sions of the jump process and fluid limit. Truncating the distribution (set- 
ting Xfc = for k > d) changes the mass to m = ^^fc ^ [Oi 1] • We can use 
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the formula for K to extend the MFZRP to all points x £ I]\f D S . The 
corresponding balls and boxes Markov chain has Nm boxes, with balls dis- 
tributed according to x. Each ordered pair of boxes interacts at rate N~^. 
If m = 1, the process is unchanged. □ 

11. The initial configuration and Markov chain mixing. We have studied 
the relaxation of the MFZRP by showing that it tends to a fluid limit as 
the number of boxes tends to infinity, and that the fluid limit converges 
exponentially to G^. Combining Theorems 1.1 and 1.2, we can conclude 
that with high probability as N ^ oo, the empirical distribution rapidly 
approaches the geometric distribution G^. 

An alternative, and in a sense stronger description would be provided by 
finding the total variation mixing time for the finite MFZRP. The process can 
only be close to the equilibrium distribution on Bn in total variation if the 
empirical distribution is close to G^ on N in Kullback-Leibler divergence. 
This follows from (3.1)-(3.3): if with probability close to 1, DKL(X(t)||G^) 
is significantly larger than expected equilibrium value, the process cannot 
be close to equilibrium at time t. 

When studying the fluid limit we restricted our attention to the initial 
distribution that arises when every box starts with R balls in: x(0) = 
This suggests we ask two different questions about the finite MFZRP. First, 
what is the (uniform) total variation mixing time ti(1/4), and second, what 
is the total variation mixing time when started with X(0) = 

The answer to the first question is order between NR and NR^ log R. In 
[16], the spectral gap for the MFZRP is shown to have order Q{R~'^) uni- 
formly in A^. Combined with (6.3) and — logvrmin = 0{NlogR), this provides 
the upper bound. 

Under the MFZRP equilibrium measure, the number of balls in each box is 
O(-RlogiV) with probability tending to 1 as iV ^ od. This provides a simple 
lower bound on the convergence time. Start all NR balls in box 1, with the 
other N — 1 boxes empty. While with high probability box 1 contains more 
than 0(i?logA) balls, the process cannot be close to equilibrium. Box 1 
is losing balls at rate 1, so time 0,{NR) is needed to get rid of the excess 
NR-0{RlogN) balls. 

We do not have a satisfactory answer to the second question. In [8], rather 
than taking the fluid limit, we worked with the entropy of the empirical dis- 
tribution directly. With probability tending to 1 as A^— > oo, DKL(X(t)||G^) 
decreases to 0(i?^A~^ log A) in time 0(i?^logA^). With a little extra care, 
both these bounds can be improved by a factor of R. 
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